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DNA METHYLATION PATTERNS 


Field of the Inventioii 

The present mvention relates to genomics and in particular to a method for the rapid 
5 assessment of the genome wide distribution of DNA methylation using restriction endonucleases 
which are sensitive to methylation within their recognition site in combination with size 
fractionation of the restricted DNA and hybridization to a DNA chip. 


Background 

10 DNA methylation is a ubiquitous biological process that occurs in diverse organisms 

ranging from bacteria to humans. During this process, DNA methyltransferases catalyze 
primarily the post-replicative addition of a methyl group to the N6 position of adenine or the C5 
position of cytosine. In higher eukaryotes, DNA methylation plays a central role in epigenetic 
regulation of gene expression and in particular in transcriptional gene silencing, genomic 

1 5 imprinting and embryonic development. Aberrations in DNA methylation have been implicated 
in aging and 

various diseases including cancer. 

A method for determining the methylation state within a genomic sequence context based 
on the use of restriction endonucleases which require the recognition sequence to be 

20 uimiethylated to allow cleavage at the site has been described by Bird et al, J. Mol Biol. 118, 27- 
47, 1978. The fragments resulting from cleavage of unmetfaylated recognition sites are detected 
by gel electrophoresis, transferred to a membrane and hybridized to a labeled probe 
corresponding to the DNA fragment to be examined. The resulting hybridization pattern reflects 
the methylation pattem of the DNA. The sensitivity of this method was increased in a variant 

25 combined with PGR (Shemer, R. et al.. Proa Natl. Acad. Sci USA, 93, 6371-6376, 1996). 

Amplification by two primers located on both sides of the recognition sequence ordy occurs after 
cleavage if the recognition sequence is in the methylated form. With both variants only 
methylation of individual positions is examined. 

A new technology, called DNA microarray technology, is attracting more and more 

30 interest among biologists. This technology provides means to monitor almost whole genomes on 
a single chip so that researchers can analyze thousands of genes simultaneously. Terminologies 
that have been used in the Uterature to describe this technology include, but are not limited to: 
biochip, DNA chip, DNA microarray, probe array and gene array. Affymetrix, Inc. owns a 
registered trademaric, GeneChip®, which refers to its high density, oligonucleotide-based DNA 
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arrays. In scientific literature, however, the term "gene chip(s)" is firequently used as a general 
tenninology that refers to the DNA microanray technology. 

An array is an orderly arrangement of DNA samples. It provides a medium for matching 
known and unknown DNA samples based on base-pairing rules and automating the process of 
identifying the unknowns. An array experiment can make use of common assay systems such as 
microplates or standard blotting membranes, and can be created by hand or make use of robotics 
to deposit the sample. In general, arrays are described as macroarrays or microarrays, the 
difference being the size of the sample spots. Macroarrays contain sample spot sizes of about 
300 microns or larger and can be easily imaged by existing gel and blot scanners. The sample 
spot sizes in microarray are typically less than 200 microns in diameter and these arrays usually 
contain thousands of spots. DNA microarray, or DNA chips are fabricated by robotics, generally 
on glass but sometimes on nylon substrates, for which probes with known identity are used to 
determine complementary binding, thus allowing massively parallel gene 
expression and gene discovery studies. An experiment with a single DNA chip can provide 
researchers information on thousands of genes simultaneously. With regard to the terminology • 
of the hybridization partners it should be noted that in connection with microarrays a "probe" is 
fi*equently considered to be a tetiiered nucleic acid with known sequence, whereas a "target" is a 
free nucleic acid sample whose identity or abundance is analyzed. 

There are two major application forms for the microarray technology: (1) identification 
of a nucleotide sequence such as a gene or gene mutation and (2) determination of the expression 
level or abundance of a nucleotide sequence. There are also two variants of the DNA microarray 
technology, m terms of the property of arrayed DNA sequence with known identity. In variant 
1, probe DNAs of 500-5,000 bases length such as cDNA are immobilized to a solid surface such 
as glass using robotic spotting and exposed to a set of targets either separately or in a mixture. In 
variant 2, an array of oligonucleotides (20-80-mer oligos) or peptide nucleic acid (PNA) probes 
are synthesized either in situ (on-chip) or by convaitional synthesis followed by on-chip 
immobilization. The array is exposed to labeled sample DNA, hybridized, and the identity or 
abundance of complementary sequences is determined. This method, "historically" called DNA 
chips, is sold under the GeneChip® trademark. In contrast to cDNA spotting methods in which a 
single clone is used for the analysis, GeneChip® arrays use multiple probes (e.g. 16 probe pairs 
for one gene in the case of the Arabidopsis chip) to interrogate a chromosomal region. This 
probe pairing strategy helps to identify and nramnize the effects of non-specific hybridization 
and background signals. A GeneChip® expression array can contain probes cprresponding to a 
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number of reference and control genes. Using reference standards, it is possible to normalize 
data from different experiments and compare multiple experiments on a quantitative level. 

Some commercially available GeneChip probe arrays are manufactured using technology 
that combines photolithographic methods and combinatorial chemistry. Tens to hundreds of. 
5 thousands of different oUgonucleotide probes are synthesized on each array. Each probe type is 
located in a specific area on the probe array called a probe cell. Each probe cell contains millions 
of copies of a given probe. Probe arrays are manufactured in a series of cycles. A glass substrate 
is coated with linkers containing photo labile protecting groups. Then, a mask is appUed that 
exposes selected portions of the probe array to ultraviolet light, nimnination removes the photo 

10 labile protecting groups enabling selective nucleoside phosphoramidite addition only at the 
previously exposed sites. Next, a different mask is applied and the cycle of illumination and 
chemical coiq)ling is performed again. By repeating this cycle, a specific set of oligonucleotide 
probes is synthesized, with each probe type in a known location. The completed probe arrays are 
packaged into cartridges. Many companies are manufacturing oligonucleotide-based chips using 

IS alternative in jztu synthesis or depositioning technologies. 

The present invention combines the use of methylation-sensitive restriction enzymes, 
DNA size fi:actionation and DNA microarray technology in a way, which makes it feasible to 
examine the level and the distribution of DNA methylation on a genome wide scale. It also 
allows resolution down to a gene, gene firagment or any chosen DNA sequence, and can be 

20 subjected tQ quantification. Whereas all types of microarrays can be used in the context of the 
present invention, it is an important aspect that the probe arrays are hybridized with a selected 
size firaction of labeled genomic target DNA that has been restricted with a methylation sensitive 
endonuclease. Hybridization intensities of dififerent sources of genomic target DNA are then 
compared to each other and optionally to a control digestion with a methylation insrasitive 

25 endonuclease to identify sequences showing dififerent levels of methylation. The intensities are 
indicative of the composition of the particular size firaction and therefore of tiie methylation 
status of the genomic targets. Thus, a comparatively high intensity of hybridization of a probe 
with a low molecular weight target firaction resulting firom cleavage with an endonuclease 
inhibited by the presmce of 5-methylcytosine or N6-methyladenine in the recognition sequence, 

30 reflects hypomethylation in the target firaction relative to other samples or controls. Similarly, a 
comparatively high intensity of hybridization of a probe with a low molecular weight target 
resulting Scorn digestion with an endonuclease requiring the presence of 5-methylcytosine or N6- 
methyladenine in the recognition sequence, reflects hypermethylation in the target firaction 
relative to other samples or controls. 
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Summary 

The present invention teaches a method to detect differences of genomic methylation 
con:q)rising 

5 (a) separately cleaving different samples of genomic DNA with a sequence specific 

endomiclease whose cleavage activity is inhibited by or requires the presence of 5- 
methylcytosine or N6-methyladerdne in the recognition sequence; 

(b) labelling a defined size fraction of the resulting DNA fragments; 

(c) separately hybridizing the labelled DNA fractions of step (b) to an array of DNA 
10 molecules representing a plurality of genomic DNA targets; and 

. (d) quantifying the differences of the hybridization intensity pattems obtained in step (c). 
Additionally, as a hybridization control, the method naay optionally comprise 

(i) cleaving identical samples of genomic DNA with a sequence specific endonuclease 
whose cleavage activity is indifferent to the presence of 5-methylcytosine or N6-methyladenine 

15 in the recognition sequence and which is preferably an isoschizomere cleaving at the same 
recognition site as the methylation-sensitive enzyme; 

(ii) labelling the same size fraction of &e resulting DNA fragments as in step (b); and 

(iii) sq)arately hybridizing the labelled DNA fractions of step (ii) to an identical array of : 
DNA molecules representing a plurality.of genomic DNA targets. 

20 . 

Detailed Description 

The method is particularly usefiil to distinguish different cell types on the basis of their 
methylation pattern, which can be extremely usefiil in the context of cancer diagnosis and 
treatment. The method, however, is not restricted to the analysis of mammalian genome 

25 methylation, but can also be used for methylation pattern anal^is in plants, animals, insects, 
fimgi or microbes. Thus, it can be used in plants to compare methylation pattems in the context 
of heterosis. It is preferred, that the DNA samples to be con^)ared are of isogenic origm such as 
healthy versus tumour tissue of flie same organism, parental DNA versus progeny or sibling 
DNA, or DNA of isogenic organisms only differing by one or more specific mutations. In 

30 general, with increasing genetic distance of the samples to be compared, interpretation of the 
results becomes more difficult. 

A number of different endonucleases inhibited by the presence of 5-methylcytosine or 
N6-methyladKiine can be used in the context of the present invention. Their respective 
recognition sequences are usually defined by 4 to 8 base pairs, shorter recognition sequences of 4 


wo 03/023065 


PCT/US02/28529 


to 6 base pairs being preferred. A number of particularly useful endonucleases are listed in . 
Table 1. Endonucleases which are useful at sites of overlapping CG are shown in Table 2. 
Particularly preferred examples of recognition sequences are ACGT, GCGC, CCGG, TCGA or 
CGCG. 






Table 1: 




Enzyme 

Site 

Enzyme 

Site 

Enzyme 

Site 


Atall 

GACGTC 

BstUI 

CGCG 

Narl 

GGCGCC 


Acil 

CCGC 

Clal 

ATCGAT 

Neil 

CCSGG 

10 

AcII 

AACGTT 

EagI 

CGGCCG 

NgoMI 

GCCGGC 


Age I 

ACCGGT 

Fnii4HI 

GCNGC 

Not I 

GCGGCCGC 


Asc I 

GGCGCGCC 

Fsel 

GGCCGGCC 

Nml 

TCGCGA 

t 

Ava I 

CYCGRG 

Fspl 

TGCGCA 

Pmll 

CACGTG 


BsaAI . 

YACGTR 

Haen 

RGCGCY 

Pvul 

CGATCG 

15 

BsaHI 

GRCGYC 

Hgal 

GACGC 

Rsrll 

CGGWCCG 


BsiEI 

CGRYCG 

Hhal 

GCGC 

Sacn 

CCGCGG 


BsiWI 

CGTACG 

HinPlI 

GCGC 

Sail 

GTCGAC 


BspD 

ATCGAT 

Hpan 

CCGG 

Smal 

CCCGGG 


BsrFI 

RCCGGY 

KasI 

GGCGCC 

SnaBI 

TACGTA 

20 

BssHn 

GCGCGC 

Mini 

ACGCGT 

Xliol 

CTCGAG 


BstBI 

TTCGAA 

Nael 

GCCGGC 







Table 2: 




Enzyme 

Site 

En^me 

Site 

Enzyme 

Site 

25 

AccI 

GTMKAC 

BsmAI 

GTCTC 

Nhel 

GCTAGC 


Acc^I 

GGTACC 

BspEI 

TCCGGA 

Rsal 

GTAC 


Apal 

GGGCCC 

BsrBI' 

GAGCGG 

PaeRTI 

CTCGAG 


ApaLI 

GTGCAC 

Dram 

CACNNNGTGPshAI 

GACNNNNGTC 


Avail 

GGWCC 

DrdI 

GACN6GTC 

Sail 

GTCGAC 

30 

Ban I 

GGYRCC 

Eael 

YGGCCR 

Sau3 A I 

GATC 


Bsal 

GGTCTC 

Earl 

CTCTTC 

Sau96I 

GGNCC 


BsaBI 

GATN4ATC 

Hindi 

GTYRAC 

ScrFI 

CCNGG 


Bsgl 

GTGCAG 

Hinfl 

GAOTC 

Sfil 

GGCC(N5)GGCC 


BslI 

CCN7GG 

Hpal 

GTTAAC 
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The type of DNA label is only important in the sense that it needs to be compatible with 
the hybridization technology required for the probes immobilized on the solid array supports. In 
general, any method which can label DNA quantitatively can be used for this application, 
5 including but not limited to random ohgonucleotide priming , nick translation, chemical labelling 
of DNA such as labelling with Biotin, and hght activated chemical labelling of DNA such as 
labelling with Psoralen-biotin and activation by UV hght. Means to perform these methods are 
commercially available. It is also possible to label the target DNA on the chip by primer 
extension using the chip oligo as the primers. A preferred support is the Affymetrix GeneChip 

10 which can be hybridized to the target DNA according to the manufacturer's instructions. 

The preferred size fraction to be labled primarily depends on the length of the recognition 
sequence cleaved by the endonuclease. Thus, for a recognition sequence of 4 base pairs the 
preferred fragment size is up to 3000 base pairs and preferentially between 100 and 2000 base 
pairs. For a 6 base pair recognition sequence a fragment size of between 100 and 6000 ba^e 

IS pairs is suitable. 

The specific probes of the DNA array may represent any type of genomic DNA, that is 
coding sequences such as cDNA or non-coding sequences such as promoters, enhance, 
terminators, introns, transposons etc. In the context of methylation studies it is preferred that the 
specific probe arrays represent non-coding sequences such as regulatory sequences of the 

20 genome of an organism. 

Experimental data resulting fix>m the hybridization can be analyzed using computer 
software. Affymetrix, for exanq)le,o£fers a program called GeneCMpMicroarraySi^^ This 
program, for comparison between two chips, measures and normalizes the 'baseline chip* 
intensity values to the average signal intensity. Intensity values of the 'experimental chip' are 

25 then compared to the baseline chip and a 'difference change' is calculated. The output of the 
software provides a qualitative call: 'increase', 'marginal increase', "no change', 'decrease', and 
'marginal decrease' as well as discrete numerical metrics used to make the caU. E:q)ression 
elements may be ranked based on absolute level of expression or relative change in expression 
between a two chip comparison. All numerical data may be exported to a Microsoft Excel 

30 spreadsheet for fiirther analysis. Expression elements are identified by a GenBank accession 
number and the GeneChip analysis software allows for immediate hyperlink to the GenBank 
entry for full sequence annotation. 
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Examples 

£xanq)le 1 - Preparation, labelling and hybridization of genomic target DNA 

Genomic DNA of Arabidopsis thaliana mutant som8 plants (O Mittelsten Scheid at al 
(1998), Proc. Natl Acad, Sci. USA 95, 632-637) and corresponding wild type plants is isolated 
by standard biochemical procedures ensuring high molecular weight and sufficient purity for 
enzymatic modifications. The DNA is subjected to endonucleolytic cleavage with the sequence 
specific endonuclease HpaUy the activity of which is blocked by the presence of 5-methyl 
cytosine in the recognition sequence. As a control, the same DNA sample is digested with the 
endonuclease MspL After endonucleolytic cleavage and agarose gel electrophoresis, various size 
firactions are eluted from the gel and labelled using the Life Technologies DNA Labelling system . 
with some modifications and scaled up to 200|il: 

L lO^lof a DNA size fraction eluted from the gel» generally containing between 

0.5-L5|ig of DNA and preferably O.Sjig of DNA, are mixed with lOjil H2O and 80^1 of a 
2.Sx Random Primer Solution and placed on ice; 

2. The nodxture is then boiled for S minutes and placed on ice again immediately; 

3. Thereafter 20^1 of a lOx dNTP mixture including Biotin-14-dCTP, 72^1 H2O and 

8pl Klenow fragment are added, mixed and briefly (about 10 seconds) centrifuged before 
incubation at 37°C for 4-6 hours; 

4. The reaction is terminated by the addition of 20|xl stop buffer 

5. DNA is precipitated by the addition of 22|al 3M NaAc and 440[il ethanol 
(95-99%) and leaving the DNA on dry ice for 20 minutes or at -20°C overnight; 

6. The DNA is then collected by centrifiigation at 4°C for 10 minutes at maximum 
speed of a table top centrifiige and discarding the supernatant; 

7. The pellet is then redissolved in 200|li1 H2O and reprecipitated with ethanol (steps 5-6); 

8. Finally the DNA pellet is dissolved in 100 jil of H2O for use in the hybridization 
procedure. 

The commercially available GeneChip® Arabidopsis genome array used in this example 
contains probe sets interrogating more than S^OO genes and more than 100 EST clusters for 
Arabidopsis thaliana. Eighty percCTit of the genes represented on the array are predicted coding 
sequences fiiom genomic BAG entries. Twenty percent are firom high quality cDNA sequences. 
The array also contains more than 100 EST clusters, sharing homology witii the predicted coding 
sequences from BAG clones. 
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The labelled target DNA of example 1 is hybridized to the Arabidopsis Genome Array 
and further analyzed as described in example 1 of EP-A-999 28S and coiresponding US Patent 
No. 6,203,989. 


5 Example 2 - Methylation patterns in som8 plants as compared to wM type plants 

Using gene-by-gene analysis it has been shown previously that the DNA fraction, which 
is preferentially demethylated in som8 Arabidopsis mutants, is composed of remnants of 
transposons and of repetitive DNA. The experimental data derived from the hybridization 
studies described in example 2, wherein the methylation status of more than 8000 genes are 

10 studied in a single experiment, are in agreement with the previous results and additionally 
provide a direct, unbiased and broader picture of genome wide DNA methylation changes in 
som8 plants as compared to wild type plants. Of the 8000 genes studied 124 can be 
characterized as being related to transposable elements and the experimental data confirm that 
transposable elements are preferentially dmiethylated in somS plants as compared to the control 

IS wild type plants. The methylation level decrease of the transposons correlates well with their 
transcriptional reactivation in many independent examples. However, a subset of transposons, 
al&ough demethylated, remains transcriptionally inactive. In addition, it is found that selected 
genes and m^bers of multigene families are also subjected to demethylation and transcriptional 
reactivation in som8 plants similar to the subset of transposons. Among this ffoup of genes 

20 those encoding pathogen resistance determinants are the most prominent examples. 
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1 . A method to detect differences of genomic methylation comprising, 

(a) separately cleaving different samples of genomic DNA with a sequence 

5 specific endonuclease whose cleavage activity is inhibited by or requires the presence of 

5-methylcytosine or N6-methyladenine in the recognition sequence; 

(b) labelling a defined size firaction of the resulting DNA firagments; 

(c) separately hybridizing the labelled DNA firactions of step (b) to an array of 
' DNA molecules representing a plurality of genomic DNA targets; and 

1 0 (d) quantifying the differences of the hybridization intensity patterns obtained in 
step(c). 

2. The method of claim 1 additionally comprising, 

(i) cleaving identical samples of genomic DNA with a sequence specific 

1 5 endonuclease whose cleavage activity is indifferent to the presence of 5-methylcytosine 

or N6-methyladenine in the recognition sequence; 

(ii) labelling the same size firaction of the resulting DNA firagments as in (b); and 

(iii) separately hybridizing the labelled DNA firactions of(ii) as a control to an 
identical array of DNA molecules representing a pluraUty of genomic DNA targets. 

20 

3. The method of claim 2, wherein the endonuclease inhibited by the presence of S- 
methylcytosine or N6-methyladenine and the endonuclease indifferent to the presence of S- 
methylcytosine or N6-methyladenine cleave at the same recognition sequmce. 

25 4. The method of claim 1, wherein the recognition sequence of the endonuclease has a 
length of 4 base pairs . 

5. The method of claim 4, wherein the recognition sequence of the endonuclease is ACGT, 
GCGC, CCGG, TCGA or CGCG. 

30 

6. The method of claim 1, wherein labelling of the DNA fiiagments is perfonned by random 
DNA priming. 


INTERNATIONAL SEARCH REPORT 


IntematioDal ^licatum No. 
PCT/US02/28529 


A. CLASSIFICATION OF SUBJECT MATTER 

IPC(7) : C12Q 1/68 
US CL : 435/6 

According to International Patent aassificatian OPQ or to both natjonal classification and IPC 

B. FIELDS SEARCHED 


Minimum documentation searched (classification system followed by classification symbols) 
U.S. : 435/6; 435/91.2 


Documentation searched other Aanrnmimmh documentation to die extent that such documents are included in die fields searched 


Electrooic data base consulted during the hitemational search (name of data base and, where practicable, search Vsnns usecO 
Please See Continuadon Sheet 


D(X:UMENTS CONSIDERED TO BE RELEVANT 


Category ' 


Citation of document, with indication, v/hsre ^ropriate, of the relevant passages 


Relevant to claim No. 


X 
Y 

Y 

Y 
Y,P 
X,P 
X.P 


WO 90/005195 Al (DNA PLANT TECHNOLOGY CORPORATION) 17 May 1990 
(17.05.90), pagesl7-21. 

CAVALUNI et al. Nuclear DNA changes within Helianthus ammus L.: variadons in the 
amount and methylation of repetitive DNA within homozygous progenies. Theor. App\, 
Genet. 1996, Vol. 92, pages 285-91. 

LOOetal. DNA Medi^ktion Pattenis of die Qobm CSenes in Human Fetal and Adult 
EiyttirQid Tissues. American Journal of Hematology. 1992, Vol. 39, pages 289-293. 
US 5,683,fr7^ (RUDERT et al) 04 November 1997 (04.11.1997). endxe document 

US 6,322,971 Bl (CHETVERIN et al) 27 November 2001 (27. 1 1 .01). entiie document 

US 2002/0042063 Al (KIUAN) 1 1 April 2002 (04. 1 1.02), paragr£ph 24 

US 2002/0006623 Al (BRADLEY et al) 17 Janurary 2002 (01.17.02), paragraphs 110- 
113 


1, 4.5.6 
1-6 

1-6 
1-6 
1-6 
I, .4, 5.6 
.1-6 


□ 


Rirther documents are listed in the continuation of Bo3t C. 


□ 


See patent family annex. 


* Spedal categories of dted docunaia: 

••A** document deTtning the general ttate of the art «liich b not coniidered to 
be of particular relevance 

'B' earlier application or patent publiihed on or after the international filing 
date 

*L" doeomeat vhich nuy thtow doubu on pxiori^ daSnffy or which li dted 
to establish the publication date of another dtation or other spedal reason 
(as spedfled) 

"O* document referring to an oral dlsdosure, use, exhibition ot other means 

"P" document pubLlibed prior to the International fUlng date but later than the 

priftriiy darp rlsimrd 


later document published after the Intemational filing date or 
prioritj^ date and not in conflict with die ^pHcation but dted to 
un demand the piindple or theoiy underlying the invention 

docnmrnt of particular idevance; the dahned Invention cannot be 
consideted novel or cannot be considered to invdve an inventive 
step when the document is taken alone 

dnonrnrnt of paiticalar rdevanee; the dahned hivendon cannot be 
coosidered to bvolve an faiventive step vAea the document is 
combined with one or more other such dooirmcnts, soch 
combinaiion bebg Obvious to apenoo skiUed hi the art 

dnraimrnt member of the same patent family 


Date of the actual completion of die international search 
30 October 2002 (30. 10.2002) 


Name and mailing address of the ISA/US 

Coounissioner of Patcati and Trademarics 
Box PCX 

Washhigton* D.C. 20231 

Facsimile No. (703)305-3230 


Date of 


search report 


Audiorized officer 
Juliet C. Hinsmann 

Telephone No. (703) 




Form PCT/ISA/210 (second sheet) (July 1998) 


INTERNATIONAL SEARCH REPORT 


PCTAJS02/28529 


Continuation of B. FIELDS SEARCHED Item 3: 

medline. biosis. caphss; key words: metbylation, hybridaation, array, microarray, biochip, sensitive, insensitive, restriction 
eDzyme/endcmiclease 


